The impact of Pleistocene glaciations and environmental gradients on the genetic structure of Embothrium coccineum

Abstract The South American temperate forests were subjected to drastic topographic and climatic changes during the Pliocene‐Pleistocene as a consequence of the Andean orogeny and glacial cycles. Such changes are common drivers of genetic structure and adaptation. Embothrium coccineum (Proteaceae) is an emblematic tree of the South American temperate forest (around 20°S of latitude) that has strongly been affected by topographic and climatic events. Previous studies have shown a marked genetic structure in this species, and distinct ecotypes have been described. Yet, little is known about their adaptive genetic responses. The main goal of this study was to investigate the effects of historical and contemporary landscape features affecting the genetic diversity and connectivity of E. coccineum throughout its current natural distribution. Using over 2000 single nucleotide polymorphisms (SNPs), we identified two genetic groups (a Northern and a Central‐Southern group) that diverged around 2.8 million years ago. The level of genetic structure was higher among populations within the Northern genetic group than within the Central‐Southern group. We propose that these differences in genetic structure may be due to differences in the assemblages of pollinators and in the evolutionary histories of the two genetic groups. Moreover, the data displayed a strong pattern of isolation by the environment in E. coccineum, suggesting that selection could have led to adaptive divergence among localities. We propose that in the Chilean temperate forest, the patterns of genetic variation in E. coccineum reflect both a Quaternary phylogenetic imprint and signatures of selection as a consequence of a strong environmental gradient.


| INTRODUC TI ON
South American temperate forests are considered of great ecological and evolutionary importance because of their isolation and high degree of endemism (Smith-Ramírez et al., 2007). In Chile, temperate forests stretch from 35°S down to 55°S. These form a continuous area that can be described as a "biogeographic island" because it is separated from its ancestral sources of biota by nearly impassable barriers such as deserts, mountains, and oceans (Villagrán, 1991).
As a result, the climate and topography characterizing the narrow strip (2000 km long and 120 km wide) of Chilean temperate forests comprise a highly heterogeneous landscape of multiple forest and soil types, climatic conditions, and disturbance regimes (Loguercio et al., 2018). Adaptive processes allowing persistence in a mosaic of different habitats may have played an important role in the diversification of plant species in the temperate forests of southern South America (Johnson et al., 2014;Prunier et al., 2017). In the same way, the pattern of genetic divergence within a single species, with an extensive distribution over a strong environmental gradient, can also be marked by selection (Grossiord et al., 2004;Saldana et al., 2007;Zeppel et al., 2014). Embothrium coccineum, J.R. Forts. & G. Forst (Dimitri, 1972) is an endemic species of Gondwanan origin with a distribution range that covers the extent of the temperate forest biome throughout Chile and Argentina (Alberdi & Donoso, 2004;Chalcoff et al., 2008;Zegers, 1994). In Embothrium, the presence of distinct ecotypes has been linked to differences in water accessibility, and the amount of snow coverage and the distribution of its ecotypes correspond to the genetic structure found using isoenzymatic genetic markers (Souto & Premoli, 2007). This correlation suggests that E. coccineum populations may be locally adapted to distinct environmental conditions across its distribution range (Bertin-Benavides et al., 2020;Souto & Premoli, 2007).
In addition to the possible effect of current selective regimes, climate shifts during the Pleistocene have also strongly altered the spatial patterns of genetic variation of many southern South American taxa (Turchetto-Zolet et al., 2013;Vidal-Russell et al., 2011). During the Last Glacial Maximum (LGM, 25,000 years Before Present, BP), a large expanse of temperate forest disappeared under thick ice caps between 42°S and 56°S (Rabassa et al., 2005). Glacial maxima rapidly forced most temperate South American species into glacial refugia north of 42°S (Allnutt et al., 1999;Premoli et al., 2000;Sersic et al., 2011). Some, more cold-tolerant species, such as E. coccineum, were also able to survive within the regions affected by glaciation, in refugia where the microclimate, the topography, and geothermal activity hindered the formation of ice caps (Allnutt et al., 1999;Premoli et al., 2000;Sersic et al., 2011). These areas may also have been sources for the restoration of genetic diversity after deglaciation (Comps et al., 2001). Lastly, fossils of pollens from coldtolerant species confirm their presence at 11,000 years BP in areas covered by ice during the LGM (Fesq-Martin et al., 2004;Steubing et al., 1983) and genetic studies using chloroplast gene sequences or nuclear isoenzymatic markers (Souto & Premoli, 2007;Vidal-Russell et al., 2011) further suggest an overall complex glacial-interglacial history in E. coccineum.
Recent developments in high-throughput sequencing technologies and population genomics have allowed to distinguish between the influence of historical and ecological processes on the spatial genetic patterns and variation in South American plants and animals (Hasbún et al., 2016;Turchetto-Zolet et al., 2013;Varas-Myrik et al., 2022). If divergent natural selection led to genetic differentiation among populations of E. coccineum along the characteristic environmental gradients of South American temperate forests, a pattern of isolation by the environment (IBE; Wang & Bradburd, 2014) should be expected. In fact, when selection against migration limits gene flow among populations from distinct environments, genetic differentiation increases with environmental differences (Wang & Bradburd, 2014;Wang & Summers, 2010). To test for the existence of IBE, the level of correlation between genetic differentiation and environmental distances should be compared with the correlation between genetic differentiation and geographic distances to assess which one better explains the genetic structuring of the species. The latter correlation is defined as isolation by distance (IBD; Slatkin, 1993) and can be considered as a null model for which genetic differentiation increases with geographical distance given the local accumulation of genetic differences when the dispersion between populations is geographically restricted. In E. coccineum, gene flow could for instance be limited by constraints in pollen transport linked to the distribution and behavior of pollinators (i.e., more than 20 species, including birds and insects, are reported as pollinators of E. coccineum; Chalcoff et al., 2008) and/or anemochorous seed dispersal (Rovere & Premoli, 2005).
The main goal of this study was to evaluate the effects of historical and contemporary landscape features affecting the genetic diversity and connectivity of E. coccineum throughout its natural distribution using single nucleotide polymorphisms (SNPs) obtained by genotyping-by-sequencing (GBS). Patterns of genetic structure E. coccineum reflect both a Quaternary phylogenetic imprint and signatures of selection as a consequence of a strong environmental gradient.

K E Y W O R D S
divergence time, isolation by environment, last glacial maxima, population genomics, Proteaceae, temperate forest

T A X O N O M Y C L A S S I F I C A T I O N
Evolutionary ecology, Population genetics were evaluated independently using all SNP loci (i.e., neutral and outlier loci) and those only found in genomic regions that may have been subjected to selection (i.e., only outlier loci). We hypothesized that patterns of genetic variation in E. coccineum will reflect the impact of historical processes (i.e., glacial/interglacial cycles during the Pleistocene) and the impact of contemporary landscape features, including the selective effects of environmental heterogeneity.

| Samples collection and DNA extraction
Locations were selected to cover an extensive range of climatic conditions across E. coccineum's ( Figure 1) natural distribution.
Three locations (Chillán, Nahuelbuta, and Curacautín) represented the northern area of the distribution; four locations (Puerto Montt, Chiloé Norte, Chiloé Sur, and Pumalín) the central area of the distribution; and three locations (Coyhaique, Chile Chico and Torres del Paine) the southern area of the distribution ( Figure 2a and Table S1).
These three regions are subsequently named Northern, Central, and Southern. Three to six individuals were sampled within each location and five mature leaves were collected per tree. A total of 42 trees were sampled. To calibrate divergence time among the genetic groups of E. coccineum, two individuals of Lomatia hirsuta Diels, a Proteaceae species from the Embothrieae tribe (Sauquet et al., 2009), were included as an outgroup. Mature leaves were transported in a cooler to the laboratory where they were frozen in liquid nitrogen and stored at −80°C until DNA extraction. Genomic DNA was obtained from leaf tissue (50 mg, powdered in a Precellys24 homogenizer; Precellys) using the Qiagen DNeasy Plant kit (Qiagen Inc.) following the manufacturer's instructions. The genomic DNA integrity was evaluated in 1% agarose gels, and its concentration was quantified using a Qubit fluorometer (Invitrogen).

| Climatic data collection
The historical values (from the years 1970 to 2000) of 19 environmental variables (i.e., "bio 1" to "bio 19") and the elevation of each location (Elv; from the package raster v.2.7.15 and the Worldclim data of 2.5 arcmins available at http://world clim.org/), were used to characterize/define spatial environmental variation. We tested for covariation among environmental variables using the vifcor() function from the "usdm" R package (Naimi et al., 2014). Variables presenting no strong covariation (R < 0.8; see Fitzpatrick & Keller, 2014) were kept for subsequent analyses (Table S1).

| Preparation of the library, high-throughput sequencing, and genotyping-by-sequence (GBS)
Library preparation and high-throughput sequencing took place at the University of Wisconsin Biotechnology Center (DNA Sequencing Facility). The preparation of the GBS genomic library followed the protocol in Elshire et al. (2011) using the ApeKI restriction enzyme and 44 individual-specific barcodes. High-throughput sequencing was performed in an Illumina HiSeq 2000 (Illumina) and single-strand sequencing runs of 100 bp. The quality of the sequenced raw data was assessed using FastQC (https://www.bioin forma tics.babra ham.ac.uk/proje cts/ fastq c/) and the raw GBS dataset can be found in the SRA repository (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA 783610).

| SNP calling and filtering process
Single nucleotide polymorphism calling was made using the de novo pipeline in Stacks v.2.2 (Catchen et al., 2013). Raw reads from Illumina sequencing were demultiplexed, adapters were removed, and reads with low quality scores were discarded using the pro-cess_radtag pipeline in Stacks v.2.2 (Catchen et al., 2013). In order to choose the best fitting assembly parameters and avoid possible biases in the outcome of the genetic structuring of populations due to parameter settings and/or missing data, distinct parameter sets were tested in the populations pipeline of Stacks v.2.2 (Catchen et al., 2013). Values of the two main parameters (-M: the number of mismatches allowed between stacks within individuals; and -n: the number of mismatches allowed between stacks between individuals) were chosen following the optimization procedure described in Paris et al. (2017). The minimum minor allele frequency (min-maf) was set to 0.02 and the maximum observed heterozygosity (max-obs-het) to 0.5; while the parameters -p (the minimum number of populations where a locus must be present for it to be included) and -r (the percentage of individuals in a population that must possess a particular locus for it to be included) were examined using eight combinations comprising an -r of 0.6 or 0.7 and a -p of 6, 7, 8, or 9. For each combination, the percent (%) of missing data and of heterozygosity, and the number of SNPs were estimated (data not shown). The final data matrix "Population Genetic dataset" (PG_dataset) was built using -p 9 and -r 0.7, which gave the highest number of SNPs and the lowest percentage of missing data.
An additional dataset was built to estimate the divergence time among the genetic groups of E. coccineum using a Bayesian multispecies coalescent model (Stange et al., 2018). First, we selected a subset of the northern (using locations Nah and Cu), central (using locations ChlN and ChlS), and the southern (using locations Coy and TP) locations using two individuals per location to maximize the number of SNPs and minimize the amount of missing data. These 12 E. coccineum individuals were processed jointly with the samples of L. hirsuta, as an outgroup. A new SNP calling was conducted with the same parameters used for the PG_dataset in the Stacks v.2.2 pipeline (Catchen et al., 2013), resulting in a "divergence time dataset" (DT_ dataset). For the latter dataset, optimal values for -p and -r parameters in order to decrease the percentage of missing data were -p 7 and -r 0.8. Putative outliers loci were removed from the DT_dataset.
For both datasets (PG_dataset and DT_dataset), only one randomly selected SNP per locus was kept to minimize the probability of coanalyzing linked markers.

| Populations' diversity statistics
To explore the distribution of genetic diversity, we calculated in GenoDive v.3.02 the observed (Ho) and expected (He) levels of heterozygosity and the inbreeding coefficient (F IS ) of each location. The percentage of polymorphic loci and private alleles were estimated using the R package "hierfstat v. 0.04-22" (Goudet, 2005).

| Genetic structure
Four complementary analyses were conducted to characterize the patterns of the genetic structure of E. coccineum using the PG_dataset. First, we calculated the pairwise FST values among all sampling locations using the function stamppFst with 1000 bootstrap replicates across loci to generate p-values and a confidence interval of 95% in the R package StAMPP v.1.5.1 (Pembleton et al., 2013). Second, a principal component analysis (PCA) was performed using the glPCA function in the R package adegenet v.2.1.1 (Jombart, 2008). Only the first three components were retained to estimate genetic groups. Third, to define the posterior probability of group membership for each individual, we used a model-based STRUCTURE clustering analysis performed in STRUCTURE v.2.3.4. (Falush et al., 2003). Ten independent simulations allowing admixture were run for each of 10 Ks (from 1 to 10 Ks), with 200,000 Markov chain Monte Carlo replicates (MCMC) and 100,000 samples F I G U R E 1 (a) Embothrium coccineum adult tree. Inserts are close-ups of (b) leaves, (c) flowers, (d) cluster roots, (e) follicles, and (f) seeds. Scale bars represent 1 cm. All photographs by Ariana Bertín.

| Outlier detection and association of allele frequency with environmental variables
Two complementary approaches were used to identify candidate loci influenced by selection in the PG_dataset. First, we used a PCA to generate a "model-free" null distribution of individual genetic distances to detect outlier loci. This analysis was conducted in the R packages PCAdapt v.4.0.3 (Luu et al., 2017) and q-value v.2.16.0 (Storey et al., 2004) with default parameter settings and a search based on K = 10. The detection of outliers was confirmed by plotting the histograms of p-values, and the Mahalanobis (D 2 ) test statistic using a False Discovery Rate (FDR) of 5%. Second, we implemented a Bayesian test to detect outlier loci using BAYESCAN 2.1 software (Foll et al., 2008). All simulations were performed using default parameter settings, 20 pilot runs of 5000 iterations followed by 50,000 sampling iterations, and an FDR of 5%. Outlier loci detected by both PCAdapt and BAYESCAN were chosen as candidate adaptive loci.
To search for candidate adaptive loci with allelic frequency influenced by the environmental variation in our study area (see Table S1), a GradientForest (GF) analysis was implemented in the R package gradientForest (Ellis et al., 2012). The GF provides a ranked list of the relative predictive power (R 2 ) of all environmental variables allowing for the identification of those that best explain the observed genetic variation. The allele frequencies of candidate adaptive loci (i.e., detected in both PCAdapt and BAYESCAN) were used as the response variable. The GF was fitted using 2000 regression trees per SNP and a variable correlation of 0.5 (Fitzpatrick & Keller, 2014).

| Isolation by distance and isolation by environment
To better understand the drivers potentially explaining the observed patterns of spatial genetic variation in E. coccineum, we performed a distance-based redundancy analysis (db-RDA) and a partial db-RDA at the individual level; first, on the complete PG_dataset and second, only on loci potentially influenced by selection. The RDA is a F I G U R E 2 (a) Sampling locations of Embothrium coccineum. Circles, triangles, and diamonds represent samples from the northern, central, and southern parts of the species distribution, respectively. Embothrium coccineum distribution area is highlighted in light blue and hatched lines represent ice limits from last glacial maxima. (b) Principal component analysis (PCA) based on a 2155 SNP dataset of Embothrium coccineum samples. Sampling location codes are as in Table S1. Symbols and colors used for each locality are as in Figure 2a.
multiple linear regression method performed between a matrix of dependent variables and a set of matrices for independent variables and has been demonstrated as the more appropriate method (as opposed to for example a Mantel test) when multiple variables are analyzed to identify drivers of population genetic structure (Nadeau et al., 2016;Orsini et al., 2013).
A dependent matrix calculated with a Bray-Curtis dissimilarity index (Bray & Curtis, 1957) was used as a response variable for each individual's co-dominant variant call format. Predictor matrices were geographic distance (IBD), environmental disparity (IBE), and the degree of shared co-ancestry (IBA). As environmental data, we used seven environmental variables with a low level of covariation (see Table S1). For the geographic distance matrix, we used Principal Coordinates of Neighborhood Matrices (PCNM) with a truncation threshold of 0.05 for long distances in the function pcnm of the R package vegan v.2.5.7 (Oksanen et al., 2007). PCNM is commonly used to transform spatial distances into rectangular data matrices suitable for constrained ordination or regression analyses (Borcard & Legendre, 2002). Populations' Q-values obtained with STRUCTURE at K = 2 (see Section 3 for details) were used as the co-ancestry variable. Prior to the analysis, all three independent matrices were scaled to a mean of zero and a variance of one, with the scale function in R (R Core Team, 2020).
Variation among populations of E. coccineum was partitioned into components explained by geographic distance (IBD), ecological gradients (IBE), co-ancestry (IBA), or their combination (i.e., constrained by the effects of the remaining two independent matrices), using the vartpart function in the R package vegan v.2.5.7 (Oksanen et al., 2007). The significance of each partition was tested with the anova.cca function and 1000 permutations in the R package vegan v.2.5.7 (Oksanen et al., 2007).

| Divergence time among the genetic groups of Embothrium coccineum
Using the DT_dataset, we implemented a Bayesian multispecies coalescent model to estimate divergence times among the genetic groups of E. coccineum (Rannala & Yang, 2003). This analysis was conducted with the coalescent simulator SNAPP v.1.3.0 package (Bryant et al., 2012) of BEAST v.2.4.5 software (Bouckaert et al., 2014). We followed a model accommodated for genome-wide SNP data, which implements a strict molecular clock and nodes calibrated with biogeographic constraints, resulting from one or more divergence events (Stange et al., 2018). This approach assumes an asymmetric substitution model and equal frequencies. It allows a single parameter to be the rate of the strict molecular clock and is currently the only model supported by SNAPP for this type of data (Stange et al., 2018).
Save a few exceptions, all parameters were set following the Github tutorial "Divergence-time estimation with SNAPP" (https:// github.com/mmats chine r/tutor ials/tree/maste r/diver gence_time_ estim ation_with_snp_data). We used a lognormal distribution for the crown age of the Embothrieae tribe, which was estimated at 66 million years (Myr; Mean = 66.6, Standard Deviation = 0.13, Mean In Real Space = TRUE) using a phylogeny of Proteaceae that included all the genera currently recognized in the family (Sauquet et al., 2009).
This estimate was used to approximate the node age of the common ancestor of E. coccineum in our dataset, which was derived from the sister relationship to the Lomatia samples. Additionally, a generation time of 10 years per individual was chosen, following experimental greenhouse observations (Bertin-Benavides A., personal observation). Alpha and beta parameters were left with a uniform prior of an arbitrary value of 1.0 (Stange et al., 2018).
Species' trees were inferred directly with no prior grouping assignment. We used SNAPP with four independent runs of 2000,000 generations, sampling every 200 generations. The results (i.e., Effective Sample Size, ESS > 300) were checked for mixing all parameters with Tracer v1.7 (Rambaut et al., 2018) and for convergence of split frequencies among runs ( Figure S2) with the R package rwty (Warren, 2019). The final trees from each run (corresponding to the 95% highest posterior density [HPD] after discarding a 10% burn-in of the tree topologies) were combined and subsequently summarized using a maximum credibility tree with TreeSetAnalyser v.2.4.5.

| RE SULTS
In order to evaluate the population structure, the divergence time among genetic groups, and the local adaptation in E. coccineum, 38 trees were genotyped using GBS (4 of 42 individuals were removed due to the low number of retained reads). A total of 116,606,122 reads were generated and de novo assembled to 1,303,750 loci, from which 473,461 SNPs were called. The application of filtering criteria resulted in a final PG_dataset of 2155 SNPs. For the DT_ dataset, comprising 12 individuals of E. coccineum and two individuals of L. hirsuta, a total of 320 SNPs were called.

| Genetic diversity
The expected and observed heterozygosity (He and Ho, respectively), the inbreeding coefficient (F IS ), the percentages of polymorphic loci (PPL), and the number of private alleles (PA) are given in

| Patterns of population genetic differentiation
The highest pairwise FST values were observed between locations from the northern and southern regions (0.319 < FST < 0.424), followed by those calculated between locations from southern and central regions (0.222 < FST < 0.429), and then by the ones calculated between locations from Central and Southern regions (0.070 < FST < 0.193). The levels of divergence between locations within the same region were generally lower (northern: 0.089 < FST < 0.174; central 0.010 < FST < 0.152; southern: 0.043 < FST < 0.061; Table 2). All pairwise FST values were significant (p < .05).
The PCA showed that E. coccineum was divided into two major genetic groups: the Northern and the Central-Southern group  In addition, GF also produced individual turnover functions for each potentially adaptive loci having an R 2 > 0 (Fitzpatrick & Keller, 2014). This allowed listing the potentially adaptive loci and the highest correlation of allelic frequencies with each of the four environmental factors of the highest impact on E. coccineum. The results revealed that the allelic frequencies of two outliers (S92192_43 and S60272_8) strongly correlated with the factor precipitation of the driest month (bio 14; Figure S3a: pink and blue lines, respectively).

| Genome scan for outlier loci detection
The allelic frequency of one of those outliers (S60272_8) also exhibited a weak response to the mean temperature of the driest quarter (bio 9; Figure S3d). Three other outliers (S27842_67, S56778_42, and S83147_54) exhibited a significant and concurrent response to the mean temperature of the wettest quarter (bio 8; Figure S3b: green, calypso, and purple lines, respectively); and the allelic frequency of two of those outliers (S272842_67 and S56778_42) also showed a weak response to elevation and mean temperature of the wettest quarter (bio 9; Figure S3c, S83147_54) gradually changed with latitude ( Figure 5b); while the allelic frequencies among locations sampled in the northern region (i.e., Ch, Nah, and Cu) were broadly similar. The remaining locus (S92192_43) had a fixed private allele in the Curacautín (Cu) location. We retrieved the sequences of these outlier loci using the population function of STACKs v2.2 to blast them against the NCBI (www.ncbi.nlm.nih.gov) database using BLASTn, but no significant alignments were found (data not shown).  Table S1.
F I G U R E 3 Structure analyses, for K = 2 (a) and K = 4 (b) of 38 Embothrium coccineum individuals based on a 2155 SNP dataset. Each individual is displayed from north to south and each individual is represented by a vertical bar. Sampling location codes are as in Table S1.
Color corresponds to genetic group, and the proportion of the genome assigned to each genetic group is given along Y-axis.

| Isolation by distance and isolation by environment
The full RDA model using the PG_datset explained 59.4% of the total genetic variation and supported the idea of a strong environmental (Env), geographic (Geo), and co-ancestry (Anc) influence on the allelic variation (Env: adjusted R 2 = .420, p < .001; Geo: adjusted R 2 = .226, p < .001; Anc: adjusted R 2 = .105, p < .001; Table 3). With the partial db_RDA, the contribution of the environmental variables to genetic divergence was much higher than the contribution of the geographic distance or the co-ancestry (Env | Geo + Anc: adjusted R 2 = .288; Geo | Env + Anc: adjusted R 2 = .118; Anc | Env + Geo: adjusted R 2 = .011; Table 3). In total, 0.178 of the explained variation was confounded between the effects of IBE, IBD, and IBA; 0.405 of the variation remained unexplained ( Table 3). Similar results were obtained for the subset of 59 outlier loci (potentially under selection). The partial db-RDA highlighted the important contribution of environmental variables to the potential local adaptation of E. coccineum (Env | Geo + Anc: adjusted R 2 = .244; Table 3).

| Divergence time
The divergence time between the Northern and Central-Southern

| DISCUSS ION
In the present study, samples of E. coccineum from across its distribution range along the Chilean temperate forest were genotyped through GBS to investigate patterns of genetic variation and their possible driv-  (Premoli et al., 2000). This heterogeneity could also have led to local adaptive divergence in E. coccineum. Indeed, the strong IBE pattern we detected in E. coccineum could reflect the impact of a gradient in temperature, precipitation, and elevation (which characterizes the species' distribution range) on its genetic diversity.

F I G U R E 4 NJ tree reconstructed based on Nei's distance between Embothrium coccineum individuals. NJ reconstruction based on (a) the
PG_dataset of 2155 SNPs, and (b) the 59 outlier loci putatively under selection. Sampling location codes are as in Table S1. Nodes with high support (>0.95) are filled in black.

| Genetic differentiation within Embothrium coccineum: two genetic groups separated since the early Pleistocene
Our results confirm the presence of two genetic groups in E. coccineum, one in the Northern and one in the Central-Southern part of the species distribution. These results support previous studies, yet at different levels of spatial resolution (Souto & Premoli, 2007;Vidal-Russell et al., 2011). The alloenzyme study, for instance, revealed only a weak geographic structure, where some northern and southern locations clustered together (Vidal-Russell et al., 2011).
The authors explained this pattern by proposing a possible effect of F I G U R E 5 Predictor overall importance plot and change in allelic frequency of five outliers loci with latitude. (a) The R 2 weighted importance of environmental variables with the low level of covariance explaining the genetic variation across sampling locations as obtained from GF analysis and (b) change in allele frequency between sampling localities, organized from north to south, for the five potentially adaptive loci (i.e, loci: S92192_43, S60272_8, S27842_67, S56778_42, and S83147_54) showing major response to environmental variables. Environmental variables used in the gradientForest analyses are the same as in Table S1; bio 14, precipitation of the driest month (mm); bio 8, mean temperature of the wettest quarter (°C); elevation, elevation (masl); bio 9, mean temperature of the driest quarter (°C); bio 4, temperature seasonality (°C); bio 3, Isothermality (%); bio 13, precipitation of the wettest month (mm).

TA B L E 3
Redundancy analysis (RDA) partitioning among population genetic variation in Embothrium coccineum into three components: Environmental (Env), geographic (Geo), and northern/central-southern co-ancestry (Anc). glacial survival in multiple refugia followed by recolonization and/or the existence of actual gene flow blurring the trace of the historical geographic structure. Contrastingly, the other study based on chloroplast genetic markers revealed a distinct southern group characterized by a very low genetic variation. In the latter, two hypotheses were used to explain this pattern: (1) the existence of northern and southern glacial refugia as well as a recent colonization event from the southern to the northern part of the distribution of the species, or (2) a unique refugium in the north and a recent recolonization toward the south (Souto & Premoli, 2007).
The genomic SNP data, given their high resolution, were unambiguously able to confirm the idea of the two proposed genetic groups and further evidenced a very high genetic similarity among individuals from the central and southern regions. The divergence time estimated among locations within the Northern lineage with the genomic SNP data was much more recent (e.g., some 0.35 Myr ago) than the divergence between the two main lineages. The divergence between Northern and Central-Southern lineages was estimated at 2.8 Myr ago, which corresponds to the Late Pliocene and Early Pleistocene.
Herewith, our study provides a refined picture of the possible divergence time between the main genetic clusters of E. coccineum but also allows for an attempt on defining mechanisms that generated the patterns of genetic structure observed at a regional and a local scale.
However, the low number of localities sampled and large gaps in the species distribution not covered by our study limit our capacity to locate the boundaries between the North and Center-South genetic Southern South American forests are composed of paleo-flora with variable origins (Segovia et al., 2013;.
Fossil records indicate that during the early Paleocene, this region supported a widespread tropical flora .
During the Oligocene-Miocene transition, in response to colder and drier conditions linked to the opening of the Drake Passage, the creation of the Antarctic Circumpolar Current and the steep increase in Antarctic ice caps, a diverse mixed paleo-flora was established that included Godwanan species preadapted to cool temperate conditions (Poole et al., 2003;Zachos et al., 2001). Nonetheless, the Pleistocene glacial cycles have dramatically altered the landscape of southern South America (Martínez & Kutschker, 2011;Ponce et al., 2011;Ramos & Ghiglione, 2008) and impacted the composition of the local biota . Currently, only one-third of the woody genera in southern South America are related to ancient paleo-flora that occupied southern South America in pre-Quaternary times (Hinojosa et al., 2006;Wardle et al., 2001). Palynological records and comparative phylogeographic studies in southern South America propose the existence of various areas where terrestrial plants and animals survived during glacial maxima of the Quaternary. Various studies (e.g., Markgraf, 1983;Vidal et al., 2005) propose the existence of refugia located north of the area heavily impacted by ice F I G U R E 6 Estimation of divergence time in Embothrium coccineum based on Bayesian coalescent analysis using SNAPP. Nodes with high support (posterior probability > .99) are filled in black. Median ages are provided above nodes with 95% highest posterior densities (HDP) below. The divergence time was inferred only for the nodes showing high support (posterior probability > .99).
Sampling location codes are as in Table S1; colored symbols for each sampling location match Figure 2a.

| Possible regional differences in pollen and seed dispersal: their effect on gene flow in Embothrium coccineum
Our analysis of the genetic differentiation drivers in E. coccineum identified a significant pattern of isolation by distance (IBD). The existence of IBD indicates that gene flow is limited as geographic distance increases (Wright, 1943). In plants such as Dactylis glomerata (Sun et al., 2017) and Protea rapens (Prunier et al., 2017), IBD patterns were detected at similar geographic scale, pointing out opportunities for increased allelic exchange between neighboring populations; and indeed some level of connectivity was also ob- Plant gene flow may occur via pollen and/or seed dispersal and is directly affected by the species' reproductive system (Martins, 1987). Seed dispersal decreases competition in areas close to the parent plant and allows the colonization of new sites (Rovere & Premoli, 2005). It also influences the genetic structure of populations since seed transport favors gene flow between and within populations (Hamrick & Nason, 1996), while it restricts potential intraspecific differentiation of ecotypes and subpopulations (Willson et al., 1994). The seeds of E. coccineum are dispersed by wind and the density of dispersed seeds declines with distance from the tree, for which 95% of the seeds disperse only 5 m from the mother tree (and up to a maximum of 20 m; Rovere & Premoli, 2005). These short dispersal distances could have significant genetic consequences, as the local establishment of propagules could result in a clustering of genetically related individuals, generating a marked genetic structure (Rovere & Premoli, 2005).
In E. coccineum, gene flow is highly dependent on pollinating agents (Mathiasen, 2004;Rovere & Chalcoff, 2010), and its flowers are visited by more than 20 species, including birds of the Orders Passeriformes and Apodiformes, and insects of the orders Hymenoptera, Diptera, Lepidoptera and Coleoptera (Chalcoff, 2008). The composition of the pollinator assemblage affects pollen transport and seed production. In regions with fewer E. coccineum pollinators, pollen limitation can be high, leading to low reproductive efficiency (i.e., few fruits produced in relation to the available flowers; Chalcoff, 2008). This could translate into populations presenting reduced genetic diversity and high genetic structure due to the lack of genetic exchange (Rovere & Chalcoff, 2010).
Pollinating insects, generally characterized by low migration capacities, are distributed throughout the distribution range of E. coccineum, even if they are more abundant and visit the plants more frequently in populations located in drier and sunnier climates (i.e., northern locations; Rovere & Chalcoff, 2010). Contrastingly, birds, such as the hummingbird Sephanoides sephaniodes, are abundant in the southern region, characterized by the coldest and most humid conditions where they act as one of the principal pollinators (Chalcoff, 2008). We propose that differences in the distribution and behavior of the pollinators between northern and centralsouthern regions could lead to the observed genetic structure in E. coccineum, with a lower dispersal capacity in the northern locations compared with the central-southern ones, leading to a higher level of structure in the former.
Other differences between regions, such as the level of landscape fragmentation or the difference in flowering times, could also generate differences in the level of gene flow (Rovere & Chalcoff, 2010). Climate has been shown to affect the flowering period of E. coccineum, for which the flowering period begins later (in November-December) in populations located at high latitudes and altitudes compared with the rest of the distribution (where it starts in September-October; Chalcoff, 2008;Hoffmann, 1997;Rovere & Chalcoff, 2010). Limited gene flow between populations presenting differences in flowering phenology could also lead to the observed pattern of divergence between and within genetic groups by limiting exchange between close populations located at contrasting altitudes. In Protea rapens, a widespread Proteaceae from the Cape Floristic Region in South Africa, a genetic split between eastern and western populations has been reported (Prunier et al., 2017).
Differences in the yearly distribution of rainfall between the east and the west were proposed as drivers of population differentiation and restricted geneflow, as these differences have a direct influence on the flowering period of P. rapens (Heelemann et al., 2008).

| Signatures of local adaptation in Embothrium coccineum
One of our goals was to differentiate between the effects of local adaptation and isolation by environment (IBE) and those of neutral processes, such as isolation by distance (IBD) or co-ancestry (IBA, linked to the species glacial history) on the genetic differentiation in E. coccineum. Our results confirmed that the genetic variation is, at least in part, driven by the adaptation to strong environmental gradients characteristic of the species' distribution range (Daniels & Veblen, 2000;Souto & Premoli, 2007;Souto & Smouse, 2014;Souto et al., 2009). Indeed, the redundancy analysis (RDA) indicated that IBE was a significant driver of population structure that explained the largest amount of among-population variation, even after controlling for IBD and IBA. These results were corroborated by the much higher divergence detected between the Northern and Central-Southern genetic groups of the NJ tree based on the genetic distance calculated only for outlier loci (putatively under selection) compared with the genetic distance calculated for all the loci genotyped.
Our gradientForest analysis identified the precipitation of the driest month as the most critical explanatory environmental variable for the pattern of genetic variation in E. coccineum. The level of precipitation of the driest month discriminated the Mediterranean biome to the north and the Patagonian steppe to the east (both with very low summer precipitation) from the temperate biome inhabited by E. coccineum (Escobar et al., 2006). Water stress is known to be an active driver of adaptive divergence in terrestrial plants and the level of precipitation is possibly the environmental variable with the highest impact on their survival (Allen et al., 2010;Engelbrecht et al., 2007). In addition, we detected other limiting factors influencing adaptive genetic variation in E. coccineum such as mean temperature of the wettest quarter and mean temperature of the driest quarter as well as elevation that together could act as drivers of natural selection among populations . The covariation between these environmental variables and latitude was supported by the fact that four of the putative outlier loci had the highest correlation with environmental factors. The allelic frequency of these loci thus changed gradually with latitude. These results emphasized that genetic differentiation in E. coccineum could be mainly associated with environmental differences among populations related to summer rainfall and winter temperature. Other studies have also found that climate and in particular precipitation lead to genetic outliers in plants and underpin the development of drought-adaptive phenotypic response (i.e., physiological and morphological traits) to the environment (Galliart et al., 2020;Gates et al., 2019).
Differences in morphology associated with environmental gradients are already known in E. coccineum (Chalcoff, 2008;Souto & Premoli, 2007) and further support the idea of possible local adaptation (current study). These studies also found that access to water strongly shaped the distribution of ecotypes in each region; where individuals grew as short trees of 0.5-2.5 m in height with small leaves in the driest climate of the northern region, while the same species reached more than 10 m height and developed longer leaves in the wet temperate forests of the central region (Souto & Smouse, 2014;Souto et al., 2009). Plants in areas with steady and high rainfall are less affected by heat and droughts, which may favor the production of larger, broader leaves to maximize the photosynthetic area at low risks of overheating and desiccation (Givnish, 1987;Hallik et al., 2009). In fact, the reduction in leaf size has been proposed as a key strategy to withstand water deficit (Baldocchi & Xu, 2007;Peguero-Pina et al., 2014). In dry environments, such as those found in the northern distribution of E. coccineum, leaves with low Specific Leaf Area (SLA) are more resistant to wilting (Cunningham et al., 1999) and last longer than leaves with high SLA, which helps the plant spare resources (Reich et al., 2003). Additionally, small, narrow leaves have less surface area, which reduces water loss, and their thinner boundary layers provide higher heat shedding and compensate for the absence of transpirational cooling (Fonseca et al., 2000;Yates et al., 2010). In Quercus faginea, a reduction in leaf size has been proposed as critical among Mediterranean oaks to withstand the water deficit characteristic of the region (Baldocchi & Xu, 2007, Peguero-Pina et al., 2014.

| CON CLUS I ON AND PROS PEC TS
Known patterns of morphological variations coupled with new results of genetic diversity and structure obtained during the present study revealed a long history of geographic isolation and local adaptation in E. coccineum from South American temperate forests.
Here, we observed a strong genetic structure in E. coccineum in part linked to local adaptation, especially related to the access to water during the driest months. Our results support previous findings that reported the existence of distinct ecotypes along the species' natural distribution. We further identified and quantified the environmental variables linked to genetic population structuring, which can aid management decisions, conservation, restoration, or reforestation purposes. For example, knowledge on the association of genotypes with environmental conditions can become crucial when selecting proper seeds, especially in species with locally adapted populations, as it is the case in E. coccineum. Moreover, as the environment changes, nonlocal seed sources may also become a considerable source for restoration or reforestation purposes and can be selected according to their match to the novel environment, if this information is available. Therefore, seed transfer guidelines can benefit from data on factors structuring the landscape of genetic variation. Many conservation efforts rely on delineating distinct populations for management but often ignore the continuous nature of landscape variation and its potential relationship to local adaptation or other processes that lead to genotype-environment associations. Concerning land management decisions, the present results advocate for seed transfer guidelines that should be restricted to each ecological zone. The strong genetic structure also offers an opportunity to further explore local adaptation among groups and exploit these for agroforestry. Bertin Jimenez for the trees' illustrations in the publication's summary diagram.

CO N FLI C T O F I NTE R E S T
On behalf of all authors, the corresponding author states that there is no conflict of interest.

This article has earned Open Data, Open Materials and Preregistered
Research Design badges. Data, materials and the preregistered design and analysis plan are available at https://www.ncbi.nlm.nih.gov/ sra/?term=PRJNA 783610.

DATA AVA I L A B I L I T Y S TAT E M E N T
The authors declare that all the work presented in this manuscript was done according to the standards of the journal Ecology and Marie-Laure Guillemin https://orcid.org/0000-0001-5703-7662